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Amendments & Claim Status 

|1| This office action is responsive to Amendment received Mar. 3, 2009. Claims 1-32 
remain pending. 

Claim Rejections - 35 U.S.C. § 112 

[2] In response to Amendment at 2 and 5, the previous § 1 12 rejection are withdrawn. 

Claim Rejections - 35 U.S.C. § 101 

[3] In response to Amendment at 2 and 5, the previous § 101 rejections are withdrawn. 

Response to Arguments 

Remarks Unpersuasive regarding Rejections Under 35 U.S.C. £ 102(b) 

[4] Amendment at 10-14 regarding 35 U.S.C. § 102 rejections with respect to claims 1-3, 5- 
6, 14, 18-19, and 23-24 have been respectfully and fully considered, but are not found 
persuasive. 

Granted, as to Claim 1, the Office Action states that providing an image sequence of at least one 
image frame is taught in FIG. 2, element 201 and FIG. 3, elements 301-308. But FIG. 3 refers to 
training images for training the Foote system shown in FIG. 2, not an image frame of element 201. 
Additionally, the Office Action states that providing a preferred number of classes of objects is 
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taught as a "pre- defined set of classes" in Col. 5, lines 14-16 to be identified within the image 
sequence. But a "predefined set of classes" is net the same as a preferred number of classes, as the 
applicants claims. In the applicants' claimed invention it is not necessary to define what type of a 
class is sought, all that is needed is the preferred number of classes sought, which requires much 
less information to specify than a class itself. 

Amendment at 12-13. 

To better define the rejection of showing the input video frames containing at least one 
image frame of a scene, the Examiner has replaced the training set reference with 6:17-19. 

However though not ad verbum, a "predefined set of classes" is equivalent to providing 
"a preferred number of classes". To predefine a set of classes, it must both be (i) a number 
associated with the amount of classes; and (ii) preferred if they were originally predefined (as 
opposed to random). 

The Examiner suggests that if it is intended to reduce information by providing only a 
numerical value (as opposed to "only a preferred number" which is broad enough to allow 
Examiner's interpretation) to further define in the claim (e.g., "providing only a numerical value 
representing a preferred number of classes of objects. . .") if supported in the original disclosure. 
This argument is relevant to claim 23. Amendment at 13-14. 



Furthermore, Claim 1 includes the limitation of automatically decomposing the image sequence 
into the preferred number of classes of objects, processing data and learning generative models at 
substantially the same rate the input data is received. Cited Column 5, lines 14-16, does not teach 
automatically decomposing the image sequence into the preferred number of classes of objects in 
near real-time by processing data and learning generative models at substantially the same rate 
the input data is received. Nothing at all is stated in this paragraph regarding processing in near 
real-time. In fact, as stated by the Examiner, Foote does not teach automatically decomposing the 
image sequence into the preferred number of classes of objects in near real-time processing data 
and learning generative models at substantially the same rate the input data is received) because 
Foote segments a full video into individual presentations based on the extent of each presenter's 
speech. (Abstract) Hence, Foote can only segment a video file with corresponding audio after it 
has been recorded, not as the data is being acquired or input. 

Amendment at 13. 

However, claim 1 cites "processing the provided image sequence and computing the 
single set of model parameters at a same rate that the image sequence is provided". This has 
been interpreted that the processing, computing, and providing of the image sequence is at a 



same "rate" by definition. 
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"Rate", by definition, is defined as "a fixed ratio between two things" and "a quantity, 
amount, or degree of something measured per unit of something else". "Rate." Def. 3a. n. and 
Def. 4a. n. Merriam- Webster Dictionary . 1 1 th ed. 2009 (available at http://www.merriam- 
webster.com/dictionary/rate[2]). 

If rate is to be interpreted as above, performing the function itself (i.e., completing the 
computation, processing, and providing) per image sequence is a "rate". In addition, a rate may 
also be performing the function for the image sequence per computer (which is one computer for 
all functions). The processing, computing, and providing are equal in this rate because they are 
all performed on one computer per entire image sequence. 

The Examiner suggests (1) further limiting what is meant by "rate", such as e.g., 
". . .computing the single set of model parameters at a substantially same time a same rate that the 
image sequence is provided" or (2) removing and positively reciting that all functions are 
performed at the same time ". . .computing the single set of model parameters at a same rate that 
the imago sequence - is provided while providing the image sequence " if supported in the original 
disclosure. This argument is relevant to claim 23. Amendment at 13-14. 

It does not teach automatically decomposing each image sequence into a generative model (e.g., a 
model of how the observed data could have been generated) with each generative model including 
a set of model parameters that represent at least one object class for each image sequence using an 
expectation-maximization analysis that employs a Viterbi analysis, wherein each generative model 
is computed at a same rate that the at least one image sequence is acquired. 

Amendment at 13-14. 

However, Foote et al. does teach a generative model as Applicant defines as a model of 
how the observed data could have been generated (if the observed data is generated by a model 
as in Foote et al. such as fig. 3, items 301-308 for example, then the model is generative). It is 
suggested to amend the claim to further define a generative model to further differentiate from 
the prior art of record. 

Remarks Unpersuasive regarding Rejections Under 35 U.S.C. .§ 103 fa) 

[5] Amendment at 14-16 regarding 35 U.S.C. § 103(a) rejections with respect to claims 4, 7, 
and 27have been respectfully and fully considered, but are not found persuasive. See above. 
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[6] Amendment at 16-18 regarding 35 U.S.C. § 103(a) rejections with respect to claims 8-10, 
13, 15-17, and 28-31 have been respectfully and fully considered, but are not found persuasive. 
See above. 

[7] Amendment at 18-19 regarding 35 U.S.C. § 103(a) rejections with respect to claims 11- 
12 have been respectfully and fully considered, but are not found persuasive. See above. 
[8] Amendment at 19-20 regarding 35 U.S.C. § 103(a) rejections with respect to claims 20- 
21 and 25-26have been respectfully and fully considered, but are not found persuasive. See 
above. 

[9] Amendment at 20-22 regarding 35 U.S.C. § 103(a) rejections with respect to claim 32 
have been respectfully and fully considered, but are not found persuasive. See above. 

Claim Rejections - 35 U.S.C. § 102 
[10] The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form 
the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1) an application for patent, published tinder section 122(b), by 
another liled in the United States before the invention by the applicant for patent or (2) a patent granted 
on an application for patent by another filed in the United States before the invention by the applicant 
for patent, except that an international application filed tinder the treaty defined in section 351(a) shall 
have the effects for purposes of this subsection of an application filed in the United States only if the 
international application designated the United States and was published tinder Article 21(2) of such 
treaty in the English language. 

Foote et al. 

[11] Claims 1-3, 5-6, 14, 18-19, and 23-24 are rejected under 35 U.S.C. § 102(b) as being 
anticipated by U.S. Patent No. 6,404,925 (issued Jun. 11, 2002, hereinafter "Foote et al"). 

Regarding claim 1, Foote et al. discloses a system (fig. 1; fig. 2) for automatically 
decomposing an image sequence (fig. 2, item 201), comprising a computer-readable storage 
medium (fig. 1, items 103, 107-108) storing a program that when executed (fig. 1, items 102, 
109) causes: 
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a computer (fig. 1) to perform the following process actions, 

providing an image sequence (fig. 2, item 201) of at least one image frame (6:17-19) of a 
scene (e.g., scene in fig. 3); 

providing only a preferred number of classes of objects (fig. 2, items 202-205; "pre- 
defined set of classes" in 5: 14-16; "Examples of video classes include close-ups of people, 
crowd scenes, and shots of presentation material. . ." at 5:16-20; "[s]hot [c]ategory" of TABLE 1 
at col. 12) to be identified (fig. 12, item 1204) within the image sequence; 

automatically decomposing (fig. 2, item 208; fig. 12, items 1202-1203; the final outcome 
at fig. 25 of three classes G,A,B) the image sequence (fig. 2, item 201) into the preferred number 
of classes of objects ("segmenting. . .into a pre-defined set of classes" in 5:14-16; the "Training 
Data" and "Test Data" of TABLE 1 at col. 12), 

using probabilistic inference (fig. 23; " hidden Markov model to be used in the method for 
classifying a video according to the present invention. Each of the image classes G, A, and B, are 
modeled using Gaussian distributions" (emphasis added) at 16:49-55; computing posterior 
distribution of variables using the hidden Markov models; "[t]he similarity between a given 
frame and the query is computed during the Viterbi algorithm as the posterior probability of the 
query state or states" at 18:42-44) and learning ("learning of the actual data points" at 15:34) to 
compute a single set of model parameters (the single set of mean visual appearances and 
variances in the "Gaussian distributions" at 6:32-33) comprising a mean visual appearance and 
variance (e.g., "Gaussian distributions having different means and variances" at 7:59-60; fig. 4; 
i.e., the model parameters from the hidden Markov model comprise means and variances of each 
class) of each class (fig. 2, items 202-205; "pre-defined set of classes" in 5:14-16; "[e]xamples of 
video classes include close-ups of people, crowd scenes, and shots of presentation material. . ." at 
5:16-20; "[s]hot [c]ategory" of TABLE 1 at col. 12) in the image sequence (fig. 2, item 201), 

processing the provided image sequence and computing the single set of model 
parameters at a same rate (e.g., the rate of performing the functions for an entire sequence per 
computer; see Response to Arguments above) that the image sequence is provided. 

In summary, the hidden Markov model of fig. 23 (probabilistic inference and learning) 
uses/computes Gaussian distributions (that compute a single set of model parameters comprising 
mean visual appearance and variance of each class in the image sequence). Doing this 
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automatically decomposes the image sequence into the preferred number of classes of objects 
shown in fig. 23, 25 (classes A,B,G). The processing, computing, and providing given above is 
done per image sequence, and hence all performed at a same rate. 

Regarding claim 2, Foote et al. discloses the system of claim 1 wherein providing the 
preferred number of objects ("pre-defined set of classes" in 5:14-16) comprises specifying the 
preferred number of classes of objects via a user interface (a user interface is visual interface 
from which a user can interact with such as fig. 22; a pre-defined set of classes suggests that 
some sort of user interface must have been used to "define" the set of classes; "[t]he feature used 
for classification are general, so that users can define arbitrary class types" in 5:18-20). 

Regarding claim 3, Foote et al. discloses the system of claim 1 wherein decomposing the 
image sequence (fig. 2, item 20 1) into the preferred number of objects ("segmenting. . .into a pre- 
defined set of classes" in 5:14-16) comprises automatically learning a 2-dimensional model (fig. 
3, items 310-322) of each object class (7:13-15). 

Regarding claim 5, Foote et al. discloses the system of claim 1 wherein automatically 
decomposing the image sequence (fig. 2, item 201) into the preferred number of object classes 
("pre-defined set of classes" in 5:14-16) comprises performing an inferential probabilistic 
analysis (fig. 2, items 202-205; "Gaussian distributions" in 5, line 65-6, line 2) of each image 
frame for identifying ("segmenting . . . into a pre-defined set of classes" in 5 : 1 4- 1 6) the preferred 
number of object class appearances within the image sequence. 

Regarding claim 6, Foote et al. discloses the system of claim 5 wherein performing an 
inferential probabilistic analysis of each image frame comprises performing a variational 
generalized expectation-maximization analysis (21:55-62) of each image frame (6:17-19) of the 
image sequence (fig. 2, item 201), wherein the expectation-maximization analysis employs a 
Viterbi algorithm (6:43-45; 16:40-42) in a process of filling in values of hidden variables (21 :55- 
62; variables in fig. 4) in a model describing the object class. 

Regarding claim 14, Foote et al. discloses the system of claim 1 wherein automatically 
decomposing the image sequence into the preferred number of object classes comprises 
performing a probabilistic variational expectation-maximization analysis (21:55-62). 

Regarding claim 18, Foote et al. discloses the system of claim 1 further comprising a 
generative model ("hidden Markov model" in 18:35-42) which includes a set of model 
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parameters ("alignment" in 18:35-42) that represent the entire image sequence ("entire video" in 
18, line 37). 

Regarding claim 19, Foote et al. discloses the system of claim 1 further comprising a 
generative model which includes a set of model parameters that represent the images of the 
image sequence processed to that point (21 :4-15). 

Regarding claim 22, Foote et al. discloses the system of claim 19 further comprising 
automatically reconstructing a representation of the image sequence from the generative model, 
wherein the representation comprises the preferred number of object classes (fig. 2, item 207). 

Regarding claim 23, Foote et al. discloses a computer-implemented process (fig. 1; fig. 
2) for automatically generating a representation of an object (e.g., "crowed" at TABLE 1, Col. 
12) in at least one image sequence (fig. 2, item 201), comprising using a computer-readable 
storage medium (fig. 1, items 103, 107-108) storing a program that when causes a computer (fig. 
l)to: 

acquire at least one image sequence (fig. 2, item 201), each image sequence having at 
least one image frame (6:17-19); 

automatically decompose each image sequence (fig. 2, item 201) into a generative model 
(fig. 2, items 202-205; "Gaussian distributions" in 5, line 65-6, line 2), with each generative 
model comprising a set of model parameters (the single set of mean visual appearances and 
variances in the "Gaussian distributions" at 6:32-33) comprising the mean visual appearance and 
variance (e.g., "Gaussian distributions having different means and variances" at 7:59-60; fig. 4; 
i.e., the model parameters from the hidden Markov model comprise means and variances of each 
class) of each class (fig. 2, items 202-205; "pre-defined set of classes" in 5:14-16; "[e]xamples of 
video classes include close-ups of people, crowd scenes, and shots of presentation material. . ." at 
5:16-20; "[s]hot [c]ategory" of TABLE 1 at col. 12) in the image sequence (fig. 2, item 201) 
being decomposed using an expectation-maximization analysis (fig. 23; " hidden Markov model 
to be used in the method for classifying a video according to the present invention. Each of the 
image classes G, A, and B, are modeled using Gaussian distributions" (emphasis added) at 
16:49-55; computing posterior distribution of variables using the hidden Markov models; "[t]he 
similarity between a given frame and the query is computed during the Viterbi algorithm as the 
posterior probability of the query state or states" at 18:42-44) that employs a Viterbi analysis 
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(6:43-45; 16:40-42), wherein each generative model is computed at a same rate (e.g., the rate of 
performing the functions for an entire sequence per computer; see Response to Arguments 
above) that the at least one image sequence is acquired. 

Regarding claim 24, claim 2 recites identical features as in claim 24. Thus, 
references/arguments equivalent to those presented above for claim 2 are equally applicable to 
claim 24. 

Claim Rejections - 35 U.S.C. § 103 

[10] The following is a quotation of 35 U.S.C. § 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

Foote et al. in view of'Petrovic et al. 

[11] Claims 4, 7, and 27 are rejected under 35 U.S.C. § 103(a) as being unpatentable over 
Foote et al. in view of Transformed Hidden Markov Models: Estimating Mixture Models of 
Images and Inferring Spatial Transformations in Video Sequences , Computer Visions and 
Pattern Recognition, 2000, Vol. 2, pg 26 - 33 (hereinafter "Petrovic et. al"). 

Regarding claim 4, while Foote et al. discloses the system of claim 3, Foote et al. does 
not directly suggest wherein the model employs a latent image and a translation variable in 
learning each object class. 

Petrovic et al. discloses transformed hidden markov model wherein the model employs a 
latent image ("latent image", pg 27-28) and a translation variable ("set of transformations. . .", pg 
27, right column) in learning each object class. 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made for the model of Foote et al. to employ a latent image and a translation variable in 
learning each object class as taught by Petrovic et al. to "develop a general video analysis tool 
that extracts long and short term similarities in video using a novel generative model, called the 
transformed hidden Markov model (THMM).", Petrovic et al, pg 26 and to "learn models of 
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different types of object from unlabeled frames in a video sequence that include background 
clutter, occlusion and spatial transformations, such as translation, rotation and shearing.", 
Petrovic et al, pg. 26. 

Regarding claim 7, while Foote et al. discloses the system of claim 3, Foote et al. does 
not directly suggest wherein the model describing the object class employs a latent image and a 
translation variable in filling in said hidden variables. 

Petrovic et al. discloses transformed hidden markov model wherein the model describing 
the object class employs a latent image ("latent image", pg 27-28) and a translation variable ("set 
of transformations. . .", pg 27, right column) in filling in hidden variables (pg 29). 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made for the model of Foote et al. to employ a latent image and a translation variable in 
filling in hidden variables as taught by Petrovic et al. to "develop a general video analysis tool 
that extracts long and short term similarities in video using a novel generative model, called the 
transformed hidden Markov model (THMM).", Petrovic et al, pg 26 and to "learn models of 
different types of object from unlabeled frames in a video sequence that include background 
clutter, occlusion and spatial transformations, such as translation, rotation and shearing.", 
Petrovic et al, pg. 26. 

Regarding claim 27, claim 4 recites identical features as in claim 27. Thus, 
references/arguments equivalent to those presented above for claim 4 are equally applicable to 
claim 27. 

Foote et al. in view of Jojic et al. 

[12] Claims 20-21 and 25-26 are rejected under 35 U.S.C. § 103(a) as being unpatentable 
over Foote et al. in view of Learning Flexible Sprites in Video Layers , Proc. of IEEE Conf. on 
Computer Vision and Pattern Recognition, 2001, pg 1-8 (hereinafter "Jojic et al"). 

Regarding claim 20, while Foote et al. discloses the system of 19, Foote et al. does not 
teach wherein the model parameters include: a prior probability of at least one object class; and 
means and variances of object appearance maps. 

Jojic et al. teaches a learning flexible sprites in video layers wherein the model 
parameters include: 
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a prior probability of at least one object class ("prior probability p(c) of spring class c", 
pg 3); and 

means and variances of object appearance maps ("means and variances of the sprite 
appearance maps", pg 3). 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made for system of Foote et al. to include wherein the model parameters include: a prior 
probability of at least one object class; and means and variances of object appearance maps as 
taught by Jojic et al. to "focus on learning the appearances of multiple objects in multiple layers, 
over the entire video sequence.", Jojic et al, pg 1 and to provide "probabilistic 2- dimensional 
appearance maps and masks of moving, occluding objects.", Jojic et al, pg 1 . 

Regarding claim 21, while Foote et al. in view of Jojic et al. discloses the system of 20, 
Foote et al. in view of Jojic et al. do not teach wherein the model further comprises observation 
noise variances. 

Jojic et al. teaches a learning flexible sprites in video layers wherein the model 
parameters include observation noise variances "the observation noise variances P", pg 3. 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made for system of Foote et al. to include wherein the model further comprises observation 
noise variances as taught by Jojic et al. to "focus on learning the appearances of multiple objects 
in multiple layers, over the entire video sequence.", Jojic et al, pg 1 and to provide 
"probabilistic 2- dimensional appearance maps and masks of moving, occluding objects.", Jojic 
et al, pg 1. 

Regarding claims 25 and 26, while Foote et al. discloses the computer-implemented 
process of claim 23, Foote et al. does not teach wherein the model parameters of each generative 
model includes 

(i) an object class appearance map, 

(ii) a prior probability of at least one object class, and 

(iii) means and variances of that object class appearance map. 

Jojic et al. teaches a learning flexible sprites in video layers wherein the model 
parameters includes (i) an object class appearance map, (ii) a prior probability of at least one 
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object class, and (iii) means and variances of that object class appearance map (Section 5, 
"Interference and Learning", first paragraph, pg 3). 

It would have been obvious to one of ordinary skill in the art at the time the invention 
was made for each generative model of Foote et al. to include (i) an object class appearance 
map, (ii) a prior probability of at least one object class, and (iii) means and variances of that 
object class appearance map as taught by Jojic et al. to "focus on learning the appearances of 
multiple objects in multiple layers, over the entire video sequence.", Jojic et al, pg 1 and to 
provide "probabilistic 2- dimensional appearance maps and masks of moving, occluding 
objects.", Jojic et al, pg 1. 

Allowable Subject Matter 
[12] Claims 8-13, 15-17, and 28-32 arc objected to as being dependent upon a rejected base 
claim, but would be allowable if rewritten in independent form including all of the limitations of 
the base claim and any intervening claims. 

Conclusion 

[13] Applicant's amendment necessitated the new ground(s) of rejection presented in this 
Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). 
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1 .136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the date of this 
final action. 

[14] Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to DAVID P. RASHID whose telephone number is (571)270-1578 



Application/Control Number: 10/649,382 
Art Unit: 2624 



Page 13 



and fax number (571)270-2578. The examiner can normally be reached Monday - Friday 7:30 - 
17:00 ET. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the Patent 

Application Information Retrieval (PAIR) system. Status information for published applications 

may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 

applications is available through Private PAIR only. For more information about the PAIR 

system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 

system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 

like assistance from a USPTO Customer Service Representative or access to the automated 

information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/VavidT. 'Rashid/ 
Examiner, Art Unit 2624 

/Bhavesh M Mehta/ David P Rashid 

Supervisory Patent Examiner, Art Unit 2624 Examiner 
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